Extracting MFCC, F0 feature in Vietnamese HMM-based speech synthesis
نویسندگان
چکیده
HMM-based statistical speech synthesis method is not requiring a very large speech corpus for training the system. In this system, statistical modeling is applied to learn distributions of context-dependent acoustic vectors extracted from speech signals, each vector containing a suitable parametric representation of one speech frame and Vietnamese phonetic rules to synthesize speech. The method presented in this paper allows accurate MFCC, F0 and tone extraction and high-quality reconstruction of speech signals. Its suitability for high-quality HMM-based speech synthesis is shown through evaluations subjectively.
منابع مشابه
Intonation issues in HMM-based speech synthesis for Vietnamese
In an HMM-based Text-To-Speech system, contextual features, including phonetic and prosodic factors have a significant influence to the spectrum, F0 and duration of the synthetic voice. This paper proposes prosodic features aiming at improving the naturalness of an HMM-based TTS system (VTed) for a tonal language, Vietnamese. The ToBI (Tones and Break Indices) features are used to learn two cru...
متن کاملF0 parameterization of glottalized tones for HMM-based vietnamese TTS
A conventional HMM-based TTS system for Hanoi Vietnamese often suffers from the hoarse quality due to the incomplete F0 parameterization of glottalized tones. As estimating F0 in glottalization is rather problematic for usual F0 extractors, we propose a pitch marking algorithm where the pitch marks are propagated from regular regions of speech signal to glottalized one, from which the complete ...
متن کاملObjective evaluation of HMM-based speech synthesis system using kullback-leibler divergence
In this paper, we propose a new objective evaluation method for hidden Markov model (HMM)-based speech synthesis using Kullback-Leibler divergence (KLD). The KLD is used to measure the difference between the probability density functions (PDFs) of the acoustic feature vectors extracted from natural training and synthetic speech data. For the evaluation, Gaussian mixture model (GMM) is used to m...
متن کاملGeneration of Fundamental Frequency Contours of Mandarin in HMM-based Speech Synthesis using Generation Process Model
The HMM-based speech synthesis system can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. In this approach, short term spectra, fundamental frequency (F0) and duration are generated by multi-stream HMMs separately. However the quality of synthetic speech degrades when feature vectors used in training are noisy. Among all noisy features, pitch tr...
متن کاملDiscontinuous Observation HMM for Prosodic-Event-Based F0 Generation
This paper examines F0 modeling and generation techniques for spontaneous speech synthesis. In the previous study, we proposed a prosodic-unit HMM where the synthesis unit is defined as a segment between two prosodic events represented by a ToBI label framework. To take the advantage of the prosodicunit HMM, continuous F0 sequences must be modeled from discontinuous F0 data including unvoiced r...
متن کامل